Single-equivalent formant prenormalizer utilizing feedback



Filed Feb. 2, 1968 L. R.- FOCHT ETAL SINGLE-QUIVALENT"FORMANT PRENORMALIZER UTILIZING FEEDBACK July 2s, 1970 United States Patent "ice 3,522,376 SINGLE-EQUIVALENT FORMANT PRE- NORMALIZER UTILIZING FEEDBACK Louis R. Focht, Huntingdon Valley, and Charles F. Teacher, Philadelphia, Pa., assignors to Philco-Ford Corporation, Philadelphia, Pa., a corporation of Delaware Filed Feb. 2, 1968, Ser. No. 702,623 Int. Cl. G01l l /00 U.S. Cl. 179-1 5 Claims ABSTRACT OF THE DISCLOSURE A single-equivalent formant speech analysis system in lwhich the amplitude of the low frequency formants of a sound are decreased when a dominant high frequency formant is detected or increased when a dominant low frequency formant is detected. This is accomplished by using the output of the single-equivalent formant detector as a feedback signal to regulate the amplitude of the low frequency formants.

by a single signal are amplitude of which is representative of the frequency of the dominant formant of the sound. As used herein, the dominant formant of a sound is the formant of largest amplitude.

It has been found that for some speakers, the amplitudes of the high frequency formants of some sounds tend to deviate from the optimum or standard (based on statistical determinations) for which the aforementioned single-equivalent formant speech analysis system is calibrated. This deviation can produce errors in operation and hence makes the analyzer unreliable for these speakers.

It is therefore an ob-ject of the present invention to provide a single-equivalent formant speech analysis system of the type described in the aforementioned copending yapplication which will respond reliably to a wider range of speech sounds.

It is a further object of the present invention to provide a single-equivalent formant speech analysis system of the type described in the aforementioned copending application capable of responding reliably to all speakers.

According to the present invention, the differences between the amplitudes of the high frequency formants of a sound and the amplitudes of its low frequency formants are increased whenever a dominant high frequency formant is detected. This increase is controlled in a feedback arrangement by a signal having an amplitude respresentative of the frequency of the single-equivalent formant of the sound.

In a preferred embodiment of the present invention an electrical representation of a speech wave is supplied to a high-pass filter channel and to a low-pass filter channel. The high-pass filter channel passes information unaltered to a 'voltage summation network, the output of which is supplied to a single-equivalent formant frequency analyzer of the type described in the aforementioned copending application. The output of the low-pass filter channel, on the other hand, is supplied to a variable gain network, the output of -which is supplied also to the voltage summation network. The gain of the variable gain network is 3,522,376 Patented July 28, 1970 controlled by the output of the single-equivalent formant frequency analyzer in a feedback arrangement.

The above objects and other objects inherent in the present invention will become more apparent when considered in conjunction with the following specification and drawings in which:

FIG. l is a block diagram of a single-equivalent formant frequency analyzer in accordance with the present invention;

FIG. 2 is a graph showing the relative formant amplitudes for the vowel sounds u (boot) and z' (eve); and

FIG. 3 is a waveform representation of the singleequivalent formant frequency voltage when the word buoy is uttered.

Referring now to the drawings, the block diagram of FIG. l shows a single-equivalent formant frequency analysis system in accordance with the present invention. An electrical representation of a speech wave, such as produced by a standard telephone carbon microphone (not shown) is supplied to a low-pass filter 2 and to a highpass filter 4. In accordance with a preferred embodiment of the present invention filter 2 is designed to pass energy in a frequency band extending from approximately cycles per second (the lower limit of ordinary telephone transmission) to 1500 cycles per second and filter 4 is designed to pass energy in a frequency band extending from approximately 1500 cycles per second to approximately 3200 cycles per second (the upper limit of ordinary telephone transmission). It is not intended that the invention be limited to the frequency bands set forth above.

Filter 4 is coupled through a voltage summation network 6 of a type well known in the electronics art, for example a simple resistive adder network, to a singleequivalent formant frequency detector 8. Detector 8 produces a signal the amplitude of which is representative of the single-equivalent formant frequency of the input speech wave by measuring the period of the first major oscillation of the input speech wave. Detector 8 may consist of a bistable switching device coupled to a pulse width to amplitude converter. The construction and operation of detector 8 is described in detail in the aforementioned copending application.

Filter 2 is coupled through a variable gain (multiplier) network 10 to a second input of voltage summation network 6. The gain of network 10 is controlled by the output signal of detector 8 which is applied to network 10 by connection 12. The direction of the change in gain of network 10 corresponds directly to the direction of the change in amplitude of the output signal of detector 8. That is, the gain of network 10 will be decreased when the amplitude of the output signal from detector 8 decreases and will be increased lwhen the amplitude of the output signal from detector 8 increases. A suitable multiplier network for the system of the present invention is ilustrated (FIG. 3) and described in U.S. Pat. No. 3,017,019, issued to V. R. Briggs on Jan. 16, 1962, entitled, Pulse Width Signal Multiplying System.

The advantages achieved by the circuit of FIG. l will be apparent when the circuit of FIG. 1 is analyzed in conjunction with the articulation of the word buoy. This word contains the vowel sounds u (boot) and i (eve). FIG. 2 shows the frequencies of the first three formants of these vowel sounds plotted against their relative formant amplitudes in db after a 9 db per octave high frequency emphasis. The 9 db per octave high frequency emphasis is necessary to illustrate the effect of the formants on the human hearing mechanism because it is believed that a high frequency emphasis of approximately 9 db per octave is performed in the human hearing mechanism. In FIG. 2 the solid line 22 illustrates that the amplitude of the first formant F1 of the vowel sound u (boot) is larger than the amplitude of the second and third formants F2 and F3 for this vowel sound while the solid line 24 illustrates that the amplitudes of the second and third formants F2 and F3 of the vowel sound i (eve) are substantially larger (for most speakers) than the first formant F1 for this vowel sound.

As explained in the abovementioned copending application, the amplitude of the signal from formant frequency detector 8 is inversely proportional to the frequency of the dominant formant. Therefore, ideally, when the word buoy is spoken, the output of detector 8 will have the waveform indicated as A in FIG. 3. Region x of waveform A represents articulation of the vowel sound u (boot) of the word buoy. Since the first formant F1 of this sound is of greater amplitude than the second and third formants F2 and F3 of this sound (FIG. 2), region x is of relatively high amplitude. Region y of `waveform A represents articulation of the vowel sound (eve) of the word buoy. Since the largest amplitude formant F2 of this sound is of greater frequency than that of the largest amplitude formant F1 of the `vowel sound u l(FIG. 2), region y is of smaller amplitude than region x.

For some speakers, the amplitudes of the dominant high frequency formants for some vowel sounds are lower than those of most speakers. The formant amplitude-frequency distribution of the vowel sound i for such a speaker may be as shown by the dashed curve 26 in FIG. 2. When a signal having this formant amplitude-frequency distribution is supplied to detector l8, detector 8 usually makes the initial determination that the second formant F2 is dominant, that is, of largest amplitude, but, due to the small difference between the amplitudes of the first and second formants F1 and F2, this determination is often only temporary and detector 8 often produces an oscillating output signal such as waveform B of FIG. 3. Waveform B indicates that the detector 8 is designating in a random fashion the second formant F2 and then the first n formant F1 as the dominant formant.

According to the present invention this random fluctuation of the signal representative of the single-equivalent formant frequency is corrected by decreasing the amplitude of the low frequency formants of a sound when a dominant high frequency formant is detected. This correction is achieved by using the output of detector 8 as a gain control for variable gain network 10. When a dominant high frequency formant is detected initially (represented by a small amplitude signal from detector 8), the gain of network 10 is decreased, thus decreasing the amplitude of the low frequency formants of the signal supplied to detector 8 relative to the high frequency formants of this signal (dotted curve 28 of FIG. 2). This increases the amplitude difference between the high and low frequency formants, providing prolonged detection of the high frequency formants as the dominant formants and thereby increasing the stability and reliability of the analysis system for all speakers.

This decrease in the amplitude of the low frequency formants of a sound in order to detect a dominant high frequency formant of the same sound does not prevent the subsequent detection of a dominant low frequency formant of a different sound because the amplitude of the latter formant will generally be high enough so that it produces an increase in the amplitude of the output signal of detector 8 even though its amplitude is decreased initially by network 10. This increase in the amplitude of the output signal of detector 8 increases the gain of network 10 which further increases the amplitude of the output signal of detector 8. The result of these increases is an output signal from detector 8 which represents articulation of a sound having a dominant low frequency formant.

While the invention has been described with reference to a particular embodiment thereof, it will be apparent that various modifications and other embodiments thereof will occur to those skilled in the art within the scope of the invention. Accordingly, we desire the scope of our invention to be limited only by the appended claims.

What `we claim is:

1. An improved system for analyzing a speech wave comprising first means for supplying an electrical representation of an acoustic speech Wave, the formants of said speech wave having a given frequency-amplitude relationship; second means for producing a signal representative of the frequency of the single-equivalent formant of a speech wave applied thereto; and third means coupled between said first means and said second means, said third means including control means for changing the frequency-amplitude relationship of the formants of said speech wave supplied to said second means whenever a signal indicative of the presence in said speech wave of a dominant high frequency formant appears at the output of said second means.

2. The system of claim 1 wherein said control means increases the difference between the amplitudes of the high frequency formants of said speech wave and the amplitudes of the low frequency formants of said speech wave in response to said signal indicative of the presence in said speech wave of a dominant high frequency formant.

3. The system of claim 2 wherein said control means includes a low pass filter coupled to said first means, a high pass filter coupled to said first means, a variable gain network having an input coupled to said low pass lter and an input coupled to the output of said second means, and a signal summation network having inputs coupled to the outputs of said high pass filter and said variable gain network, and an output coupled to said second means.

4. The system of claim 3 wherein said low pass filter has a bandpass extending from approximately cycles per second to approximately 15'00 cycles per second and said high pass lter as a bandpass extending from approximately 1500 cycles per second to approximately 3200 cycles per second.

5. The system of claim 1 wherein said control means decreases the amplitudes of the low-frequency formants of said speech wave in response to said Signal indicative of the presence in said speech wave of a dominant high frequency formant.

No references cited.

KATHLEEN H. CLAFFY, Primary Examiner J. B. LEAHEEY, Assistant Examiner 

